A statistical perspective on data mining
نویسندگان
چکیده
Data mining can be regarded as a collection of methods for drawing inferences from data. The aims of data mining, and some of its methods, overlap with those of classical statistics. However, there are some philosophical and methodological di erences. We examine these di erences, and we describe three approaches to machine learning that have developed largely independently: classical statistics, Vapnik's statistical learning theory, and computational learning theory. Comparing these approaches, we conclude that statisticians and data miners can pro t by studying each other's methods and using a judiciously chosen combination of them.
منابع مشابه
A Geometric View of Similarity Measures in Data Mining
The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...
متن کاملStatistics and Data Mining
From a statistical perspective Data Mining can be viewed as computer automated exploratory data analysis of large complex data sets. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in fields other than Statistics ̄ This report will discuss the discrepancies of these two fields and give a surve...
متن کاملA Statistical Perspective on Data Mining
Technological advances have led to new and automated data collection methods. Datasets once at a premium are often plentiful nowadays and sometimes indeed massive. A new breed of challenges are thus presented – primary among them is the need for methodology to analyze such masses of data with a view to understanding complex phenomena and relationships. Such capability is provided by data mining...
متن کاملSurvey of Clustering Data Mining Techniques
Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns...
متن کامل22. Bayesian Ying Yang Learning (I): A Unified Perspective for Statistical Modeling
Major dependence structure mining tasks are overviewed from a general statistical learning perspective. Bayesian Ying Yang (BYY) harmony learning has been introduced as a unified framework for mining these dependence structures, with new mechanisms for model selection and regularization on a finite size of samples. Main results are summarized and bibliographic remarks are made. Two typical appr...
متن کاملAssociation Rules Network: Definition and Applications
The role of data mining is to search “the space of candidate hypotheses” to offer solutions, whereas the role of statistics is to validate the hypotheses offered by the data-mining process. In this paper we propose Association Rules Networks (ARNs) as a structure for synthesizing, pruning, and analyzing a collection of association rules to construct candidate hypotheses. From a knowledge discov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 13 شماره
صفحات -
تاریخ انتشار 1997